首页> 外文OA文献 >When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity
【2h】

When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

机译:何时超完整主题模型可识别?张量的唯一性   结构稀疏性的Tucker分解

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Overcomplete latent representations have been very popular for unsupervisedfeature learning in recent years. In this paper, we specify which overcompletemodels can be identified given observable moments of a certain order. Weconsider probabilistic admixture or topic models in the overcomplete regime,where the number of latent topics can greatly exceed the size of the observedword vocabulary. While general overcomplete topic models are not identifiable,we establish generic identifiability under a constraint, referred to as topicpersistence. Our sufficient conditions for identifiability involve a novel setof "higher order" expansion conditions on the topic-word matrix or thepopulation structure of the model. This set of higher-order expansionconditions allow for overcomplete models, and require the existence of aperfect matching from latent topics to higher order observed words. Weestablish that random structured topic models are identifiable w.h.p. in theovercomplete regime. Our identifiability results allows for general(non-degenerate) distributions for modeling the topic proportions, and thus, wecan handle arbitrarily correlated topics in our framework. Our identifiabilityresults imply uniqueness of a class of tensor decompositions with structuredsparsity which is contained in the class of Tucker decompositions, but is moregeneral than the Candecomp/Parafac (CP) decomposition.
机译:近年来,不完整的潜在表示在非监督功能学习中非常受欢迎。在本文中,我们指定在给定顺序的可观察时刻下可以识别出哪些超完备模型。我们考虑在过度完备状态下的概率混合或主题模型,其中潜在主题的数量可能大大超过观察到的单词词汇量。虽然无法识别一般的超完备主题模型,但我们在称为主题持久性的约束下建立了一般的可识别性。我们足够的可识别性条件涉及主题词矩阵或模型的人口结构上的一组新的“高阶”扩展条件。这组高阶扩展条件允许模型过于完整,并且要求存在从潜在主题到高阶观测词的完美匹配。我们确定随机构造的主题模型在w.h.p中是可识别的。在不完整的政权中我们的可识别性结果允许使用一般(非退化)分布来建模主题比例,因此,我们可以在我们的框架中处理任意相关的主题。我们的可识别性结果暗示了一类具有结构稀疏性的张量分解的唯一性,它包含在Tucker分解类中,但比Candecomp / Parafac(CP)分解更一般。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号